21 research outputs found
Hand2Face: Automatic Synthesis and Recognition of Hand Over Face Occlusions
A person's face discloses important information about their affective state.
Although there has been extensive research on recognition of facial
expressions, the performance of existing approaches is challenged by facial
occlusions. Facial occlusions are often treated as noise and discarded in
recognition of affective states. However, hand over face occlusions can provide
additional information for recognition of some affective states such as
curiosity, frustration and boredom. One of the reasons that this problem has
not gained attention is the lack of naturalistic occluded faces that contain
hand over face occlusions as well as other types of occlusions. Traditional
approaches for obtaining affective data are time demanding and expensive, which
limits researchers in affective computing to work on small datasets. This
limitation affects the generalizability of models and deprives researchers from
taking advantage of recent advances in deep learning that have shown great
success in many fields but require large volumes of data. In this paper, we
first introduce a novel framework for synthesizing naturalistic facial
occlusions from an initial dataset of non-occluded faces and separate images of
hands, reducing the costly process of data collection and annotation. We then
propose a model for facial occlusion type recognition to differentiate between
hand over face occlusions and other types of occlusions such as scarves, hair,
glasses and objects. Finally, we present a model to localize hand over face
occlusions and identify the occluded regions of the face.Comment: Accepted to International Conference on Affective Computing and
Intelligent Interaction (ACII), 201
OpenFace: An open source facial behavior analysis toolkit
Over the past few years, there has been an increased
interest in automatic facial behavior analysis and understanding.
We present OpenFace – an open source tool
intended for computer vision and machine learning researchers,
affective computing community and people interested
in building interactive applications based on facial
behavior analysis. OpenFace is the first open source tool
capable of facial landmark detection, head pose estimation,
facial action unit recognition, and eye-gaze estimation.
The computer vision algorithms which represent the core of
OpenFace demonstrate state-of-the-art results in all of the
above mentioned tasks. Furthermore, our tool is capable of
real-time performance and is able to run from a simple webcam
without any specialist hardware. Finally, OpenFace
allows for easy integration with other applications and devices
through a lightweight messaging system.European Community Seventh Framework Programme (FP7/2007-2013) under grant agreement No. 289021 (ASC-Inclusion)
Video-based sympathetic arousal assessment via peripheral blood flow estimation
Electrodermal activity (EDA) is considered a standard marker of sympathetic
activity. However, traditional EDA measurement requires electrodes in steady
contact with the skin. Can sympathetic arousal be measured using only an
optical sensor, such as an RGB camera? This paper presents a novel approach to
infer sympathetic arousal by measuring the peripheral blood flow on the face or
hand optically. We contribute a self-recorded dataset of 21 participants,
comprising synchronized videos of participants' faces and palms and
gold-standard EDA and photoplethysmography (PPG) signals. Our results show that
we can measure peripheral sympathetic responses that closely correlate with the
ground truth EDA. We obtain median correlations of 0.57 to 0.63 between our
inferred signals and the ground truth EDA using only videos of the
participants' palms or foreheads or PPG signals from the foreheads or fingers.
We also show that sympathetic arousal is best inferred from the forehead,
finger, or palm.Comment: Accepted and to be published at Biomedical Optics Expres
A facial affect mapping engine
Facial expressions play a crucial role in human interaction. Interactive digital games can help teaching people to both express and recognise them. Such interactive games can benefit from the ability to alter user expressions dynamically and in real-time. In this demonstration, we present the Facial Affect Mapping Engine (FAME), a framework for mapping and manipulating facial expressions across images and video streams. Our system is fully automatic runs in real-time and does not require any specialist hardware. FAME presents new possibilities for the designers of intelligent interactive digital games
Real-Time Inference of Mental States from Facial Expressions and Upper Body Gestures
We present a real-time system for detecting facial action units and inferring emotional states from head and shoulder gestures and facial expressions. The dynamic system uses three levels of inference on progressively longer time scales. Firstly, facial action units and head orientation are identified from 22 feature points and Gabor filters. Secondly, Hidden Markov Models are used to classify sequences of actions into head and shoulder gestures. Finally, a multi level Dynamic Bayesian Network is used to model the unfolding emotional state based on probabilities of different gestures. The most probable state over a given video clip is chosen as the label for that clip. The average F1 score for 12 action units (AUs 1, 2, 4, 6, 7, 10, 12, 15, 17, 18, 25, 26), labelled on a frame by frame basis, was 0.461. The average classification rate for five emotional states (anger, fear, joy, relief, sadness) was 0.440. Sadness had the greatest rate, 0.64, anger the smallest, 0.11.Thales Research and Technology (UK)Bradlow Foundation TrustProcter & Gamble Compan
Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion
This paper presents a 3D generative model that uses diffusion models to
automatically generate 3D digital avatars represented as neural radiance
fields. A significant challenge in generating such avatars is that the memory
and processing costs in 3D are prohibitive for producing the rich details
required for high-quality avatars. To tackle this problem we propose the
roll-out diffusion network (Rodin), which represents a neural radiance field as
multiple 2D feature maps and rolls out these maps into a single 2D feature
plane within which we perform 3D-aware diffusion. The Rodin model brings the
much-needed computational efficiency while preserving the integrity of
diffusion in 3D by using 3D-aware convolution that attends to projected
features in the 2D feature plane according to their original relationship in
3D. We also use latent conditioning to orchestrate the feature generation for
global coherence, leading to high-fidelity avatars and enabling their semantic
editing based on text prompts. Finally, we use hierarchical synthesis to
further enhance details. The 3D avatars generated by our model compare
favorably with those produced by existing generative techniques. We can
generate highly detailed avatars with realistic hairstyles and facial hair like
beards. We also demonstrate 3D avatar generation from image or text as well as
text-guided editability.Comment: Project Webpage: https://3d-avatar-diffusion.microsoft.com
Recommended from our members
CAM3D
Cam3D consists of 108 labelled videos of 12 mental states including spontaneous facial expressions and hand gestures. It was labelled using crowd-sourcing (inter-rater reliability Κ=0.45).
We used three different sensors for data collection: Microsoft Kinect sensors, HD cameras, and microphones in the HD cameras. After the initial data collection, the videos were segmented. Each segment showed a single event such as a change in facial expression, head and body posture movement or hand gesture. From videos with public consent, a total of 451 segments were collected. The mean duration is 6 seconds.
Labelling was based on context-free observer judgment. Public segments were labelled by community crowd-sourcing. Out of the 451 segmented videos we wanted to extract the ones that can reliably be described as belonging to one of the 24 emotion groups from the Baron-Cohen taxonomy. From the 2916 labels collected, 122 did not appear in the taxonomy so were not considered in the analysis. The remaining 2794 labels were grouped as belonging to one of the 24 groups plus agreement, disagreement, and neutral. To fi lter out non-emotional segments we chose only the videos that 60% or more of the raters agreed on. This resulted in 108 segments in total. The most common label given to a video segment was considered as the ground truth.
The data is categorized by the ground-truth label and divided into seven folders. For each video segment, we provide the colour video, camera parameters, colour images and their corresponding aligned depth images
Crowdsouring in Emotion Studies Across Time and Culture
Crowdsourcing is becoming increasingly popular as a cheap and effective tool for multimedia annotation. However, the idea is not new, and can be traced back to Charles Darwin. He was interested in studying the universality of facial expressions in conveying emotions, thus he had to consider a global population. Access to different cultures allowed him to reach more general conclusions. In this paper, we highlight a few milestones in the history of the study of emotion that share the concepts of crowdsourcing. We first consider the study of posed photographs and then move to videos of natural expressions. We present our use of crowdsouring to label a video corpus of natural expressions, and also to recreate one of Darwin's original emotion judgment experiments. This allows us to compare people's perception of emotional expressions in the 19th and 21st centuries, showing that it remains stable through both culture and time